Skip to content

[WIP] Refactor exec_place to unified grid model#8035

Draft
andralex wants to merge 15 commits intomainfrom
refactor/exec-place-unified-grid
Draft

[WIP] Refactor exec_place to unified grid model#8035
andralex wants to merge 15 commits intomainfrom
refactor/exec-place-unified-grid

Conversation

@andralex
Copy link
Contributor

Summary

This PR refactors exec_place to use a unified grid model where all execution places are treated as grids:

  • Scalar places (host, single device, cuda_stream, green_ctx) are 1-element grids
  • Multi-device grids work as before

Key Changes

  • Added get_dims(), get_place(idx) to exec_place::impl interface
  • Changed activate()/deactivate() to take an index parameter (defaults to 0 for backward compatibility)
  • Moved set_current_place()/unset_current_place()/get_current_place() from exec_place_grid to base exec_place
  • Deprecated is_grid() in favor of size() > 1
  • Updated all client code (stream_task, graph_task, parallel_for_scope, launch, place_partition) to use the new interface

Benefits

  • Eliminates special-casing for grids vs non-grids
  • Allows uniform iteration over any exec_place
  • Cleaner API with consistent indexed access

Status

Work in Progress - needs build verification and testing.

Test plan

  • Build verification
  • Run existing CUDASTF tests
  • Verify no behavioral changes

Made with Cursor

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Mar 14, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Mar 14, 2026
All execution places are now modeled as grids:
- Scalar places (host, device) are 1-element grids
- Multi-device grids remain as before

Key changes:
- Added get_dims(), get_place(idx) to exec_place::impl
- Changed activate/deactivate to take index parameter
- Moved set_current_place/unset_current_place to exec_place base
- Deprecated is_grid() in favor of size() > 1
- Updated all client code to use new interface

This eliminates special-casing for grids vs non-grids and allows
uniform iteration over any exec_place.

Made-with: Cursor
@andralex andralex force-pushed the refactor/exec-place-unified-grid branch from 74bf808 to dacd49e Compare March 14, 2026 02:06
The shell class added no value - just use the dynamic interface
directly and return exec_place from the factory methods.

Made-with: Cursor
@andralex
Copy link
Contributor Author

/ok to test 6e2df83

Same as exec_place_cuda_stream - the shell class adds no value.
Use the dynamic interface directly and return exec_place from factory.

Made-with: Cursor
@github-actions

This comment has been minimized.

- Remove meaningless const from return-by-value affine_data_place()
- Remove virtual is_grid() from impl - just use size() > 1 directly

Made-with: Cursor
Replace separate operator== and operator< virtual methods with a single
cmp() method that returns -1/0/1, consistent with data_place_interface.

Made-with: Cursor
With the unified grid model, 1-element grids have size()==1 but
is_device()==false. Update place_partition to handle this case by
extracting the underlying scalar place from 1-element grids.

Made-with: Cursor
- Remove virtual is_host()/is_device() from impl; move logic to shell
  (base impl already returns correct values via affine data place)
- Move grid iteration state (current_idx, saved_prev_impl) to grid impl
  since only multi-element grids use iteration
- Add virtual accessors for grid state with assertions for misuse

Made-with: Cursor
@andralex
Copy link
Contributor Author

/ok to test 35c8060

@github-actions

This comment has been minimized.

@andralex andralex force-pushed the refactor/exec-place-unified-grid branch from 187f3d2 to 01fe359 Compare March 16, 2026 18:06
The condition `is_grid()` was changed to `size() > 1` but this broke
1-element grids (e.g. `all_devices()` on single-GPU systems).

Fix 1: Restore is_grid() to detect any grid by checking if
affine_data_place() is invalid (only grids have invalid affine).

Fix 2: Factory functions now collapse 1-element grids to scalars:
- make_grid(), all_devices(), n_devices(), repeat()
- partition_cyclic(), partition_tile()

This ensures that by construction, any true grid has size() > 1,
making `size() > 1` equivalent to `is_grid()` in practice.

Benefits:
- Simpler mental model: grids always have multiple elements
- No edge cases for 1-element grids
- Single-GPU `all_devices()` returns `device(0)` directly

Return type changes (exec_place_grid -> exec_place):
- all_devices(), n_devices(), repeat(), make_grid()
- partition_cyclic(), partition_tile()
- place_partition::as_grid()

Made-with: Cursor
@andralex andralex force-pushed the refactor/exec-place-unified-grid branch from 01fe359 to 672cf33 Compare March 16, 2026 18:53
With 1-element grids now collapsed to scalars by factory functions,
is_grid() is equivalent to size() > 1. Remove the method and use
the simpler size check directly.

Made-with: Cursor
@andralex andralex force-pushed the refactor/exec-place-unified-grid branch from 686889a to 36ea11e Compare March 16, 2026 19:55
@andralex
Copy link
Contributor Author

/ok to test 2779e89

@github-actions

This comment has been minimized.

Update data_place::composite() to accept const exec_place& instead of
const exec_place_grid&. This allows passing exec_place from factory
functions that now return exec_place (like repeat(), all_devices()).

Also update:
- data_place_composite to store exec_place instead of exec_place_grid
- get_grid() to return const exec_place&
- localized_array constructor parameter
- slice interface local variables

Made-with: Cursor
@andralex
Copy link
Contributor Author

/ok to test 383957a

exec_place_grid was a shell class that provided no additional value over
exec_place now that all places support the unified grid model. This change:

- Removes exec_place_grid class, keeps impl as exec_place_grid_impl
- Updates make_grid() to return exec_place directly
- Changes as_grid() to return const exec_place& (just returns *this)
- Updates place_partition constructor to take const exec_place&
- Updates loop_dispatch template parameter to exec_place
- Removes deleted parallel_for(exec_place_grid, ...) overload
- Updates C API (stf.cu) to use exec_place* instead of exec_place_grid*
- Removes forward declarations of exec_place_grid

The C API types (stf_exec_place_grid_handle) are unchanged as they are
opaque void* handles that don't depend on the C++ class name.

Made-with: Cursor
@andralex
Copy link
Contributor Author

/ok to test 6534679

@github-actions

This comment has been minimized.

@andralex
Copy link
Contributor Author

/ok to test 77589ae

@github-actions
Copy link
Contributor

🥳 CI Workflow Results

🟩 Finished in 44m 50s: Pass: 100%/52 | Total: 19h 30m | Max: 37m 02s | Hits: 48%/24201

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant